The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis.

نویسندگان

  • Sofie Van Landeghem
  • Stefanie De Bodt
  • Zuzanna J Drebert
  • Dirk Inzé
  • Yves Van de Peer
چکیده

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison between knowledge-driven fuzzy and data-driven artificial neural network approaches for prospecting porphyry Cu mineralization; a case study of Shahr-e-Babak area, Kerman Province, SE Iran

The study area, located in the southern section of the Central Iranian volcano–sedimentary complex, contains a large number of mineral deposits and occurrences which is currently facing a shortage of resources. Therefore, the prospecting potential areas in the deeper and peripheral spaces has become a high priority in this region. Different direct and indirect methods try to predict promising a...

متن کامل

Identification of miR-24 and miR-137 as novel candidate multiple sclerosis miRNA biomarkers using multi-staged data analysis protocol

Many studies have investigated misregulation of miRNAs relevant to multiple sclerosis (MS) pathogenesis. Abnormal miRNAs can be used both as candidate biomarker for MS diagnosis and understanding the disease miRNA-mRNA regulatory network. In this comprehensive study, misregulated miRNAs related to MS were collected from existing literature, databases and via in silico prediction. A multi-staged...

متن کامل

Situation and Text: Representation of Migrants Whilst the Escalation of Refugee Crisis in Great Britain as Compared to Russia

Increasing migration is a vital concern for a globalizing sociocultural environment in today’s world. The UK and developed European countries have become an attractive destination for asylum seekers (labelled as “migrants”) in the past decade. The rapid rise in the number of asylum seekers, which was labelled “migration crisis” (Ruz, 2015), made this topic an integral part of scientific discuss...

متن کامل

Isolation and molecular characterization of the RecQsim gene in Arabidopsis, rice (Oryza sativa) and rape (Brassica napus)

In any organism that reproduces sexually, DNA Recombination plays vital roles in the generation of allelic diversity as well as in preservation of genome fidelity. Genome fidelity is particularly important in plants because mutations occurring during the development of flowering plants are heritable and can be passed onto the next generation. One of the gene families that play crucial roles in ...

متن کامل

Comparison of Three Decision-Making Models in Differentiating Five Types of Heart Disease: A Case Study in Ghaem Sub-Specialty Hospital

Introduction: cardiovascular diseases are becoming the main cause of mortality and morbidity in most countries. This research goal was to predict the types of heart diseases for more accurate diagnosis by data mining and neural network technics. Method: This research was an applied-survey study and after data preprocessing, three approaches of neural network, decision making tree and Bayes simp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • The Plant cell

دوره 25 3  شماره 

صفحات  -

تاریخ انتشار 2013